Background :

McCurr Consultancy is an MNC that has thousands of employees spread across the globe. The company believes in hiring the best talent available and retaining them for as long as possible. A huge amount of resources is spent on retaining existing employees through various initiatives. The Head of People Operations wants to bring down the cost of retaining employees. For this, he proposes limiting the incentives to only those employees who are at risk of attrition. As a recently hired Data Scientist in the People Operations Department, you have been asked to identify patterns in characteristics of employees who leave the organization. Also, you have to use this information to predict if an employee is at risk of attrition. This information will be used to target them with incentives.

Objective :

Dataset :

The data contains demographic details, work-related metrics and attrition flag.

In the real world, you will not find definitions for some of your variables. It is a part of the analysis to figure out what they might mean.

Import necessary libraries

Read the dataset

View the first and last 5 rows of the dataset.

Understand the shape of the dataset.

Check the data types of the columns for the dataset.

Observations -

converting "objects" to "category" reduces the data space required to store the dataframe

Fixing the data types

we can see that the memory usage has decreased from 804 KB to 624.4 KB, this technique is generally useful for bigger datasets.

Summary of the dataset.

Dropping columns which are not adding any information.

Let's look at the unqiue values of all the categories

Note: The EDA section of the notebook has been covered multiple times in the previous case studies. For this discussion, EDA section can be skipped and we can refer to the EDA summary below. For the detailed EDA, Appendix Section can be referred.

The below three functions need to be defined to carry out the Exploratory Data Analysis.

Summary of EDA

Data Cleaning:

Observations from EDA:

Bivariate Analysis

Model Building - Approach

  1. Data preparation
  2. Partition the data into train and test set.
  3. Build model on the train data.
  4. Tune the model if required.
  5. Test the data on test set.

Split Data

Model evaluation criterion

Model can make wrong predictions as:

  1. Predicting an employee will attrite and the employee doesn't attrite
  2. Predicting an employee will not attrite and the employee attrites

Which case is more important?

How to reduce this loss i.e need to reduce False Negatives?

Let's define function to provide metric scores(accuracy,recall and precision) on train and test set and a function to show confusion matrix so that we do not have use the same code repetitively while evaluating models.

Build Decision Tree Model

Confusion Matrix -

Bagging Classifier

Bagging Classifier with weighted decision tree

Random Forest

Random forest with class weights

Tuning Models

Using GridSearch for Hyperparameter tuning model

Tuning Decision Tree

Tuning Bagging Classifier

Tuning Random Forest

Comparing all the models

Feature importance of Random Forest

Business Insights and Recommendations

Appendix

Univariate analysis

Observations on Age

Observations on DailyRate

Observations on DistanceFromHome

Observations on HourlyRate

Observations on MonthlyIncome

Observations on MonthlyRate

Observations on NumCompaniesWorked

Observations on PercentSalaryHike

Observations on TotalWorkingYears

Observations on YearsAtCompany

Observations on YearsInCurrentRole

Observations on YearsInCurrentRole

Observations on YearsWithCurrManager

Observations on BusinessTravel

Observations on Department

Observations on EducationField

Observations on Gender

Observations on JobRole

Observations on MaritalStatus

Observations on OverTime

Observations on Attrition

Bivariate Analysis

Attrition vs Earnings of employee

Attrition vs Years working in company

Attrition vs Previous job roles

Checking if performace rating and salary hike are related-

Observations-

To jump back to the EDA summary section, click here.